Detection of word fragments in Mandarin telephone conversation

نویسندگان

  • Cheng-Tao Chu
  • Yun-Hsuan Sung
  • Yuan Zhao
  • Daniel Jurafsky
چکیده

We describe preliminary work on the detection of word fragments in Mandarin conversational telephone speech. We extracted prosodic, voice quality, and lexical features, and trained Decision Tree and SVM classifiers. Previous research shows that glottalization features are instrumental in English fragment detection. However, we show that Mandarin fragments are quite different than English; 90% of Mandarin fragments are followed immediately by a repetition of the fragmentary word. These repetition fragments are not glottalized, and they have a very specific distribution; the 12 most frequent words (“you”, “I”, “that”, “have”, “then”, etc.) cover 50% of the tokens of these fragments. Thus rather than glottalization, we found the most useful feature for Mandarin fragment detection was the identity of the neighboring character (word or morpheme). In an oracle experiment using the true (reference) neighboring words as well as prosodic and voice quality features, we achieved 80% accuracy in Mandarin fragment detection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus

The paper describes the design, collection, transcription and analysis of 200 hours of HKUST Mandarin Telephone Speech Corpus (HKUST/MTS) from over 2100 Mandarin speakers in mainland China under the DARPA EARS framework. The corpus includes speech data, transcriptions and speaker demographic information. The speech data include 1206 ten-minute natural Mandarin conversations between either stran...

متن کامل

Vocabulary Lists for EAP and Conversation Students

Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...

متن کامل

Symbol Sequence Search from Telephone Conversation

We propose a method for searching for symbol sequences in conversations. Symbol sequences can include phone numbers, credit card numbers, and any kind of ticket (identification) numbers and are often communicated in call center conversations. Automatic extraction of these from speech is a key to many automatic speech recognition (ASR) applications such as question answering and summarization. C...

متن کامل

Telephone Conversation Closing Strategies Used by Persian Speakers: Rapport Management Approach

The use of politeness strategies can help interlocutors promote and/or maintain social harmony in telephone interactions. Using the Rapport Management Model proposed by Spencer-Oatey (2008), this study aimed primarily to reinvestigate the closing structures of telephone conversation (hereafter abbreviated as TC) in Persian and to discover the common politeness strategies used by native Persian ...

متن کامل

Telephone Conversation Closing Strategies Used by Persian Speakers: Rapport Management Approach

The use of politeness strategies can help interlocutors promote and/or maintain social harmony in telephone interactions. Using the Rapport Management Model proposed by Spencer-Oatey (2008), this study aimed primarily to reinvestigate the closing structures of telephone conversation (hereafter abbreviated as TC) in Persian and to discover the common politeness strategies used by native Persian ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006